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Precise measurement of physical activity is important for health research, providing a better 
understanding of activity location, type, duration, and intensity. This article describes a novel 
suite of tools to measure and analyze physical activity behaviors in spatial epidemiology 
research. We use individual-level, high-resolution, objective data collected in a space-time 
framework to investigate built and social environment influences on activity. First, we col- 
lect data with accelerometers, global positioning system units, and smartphone-based 
digital travel and photo diaries to overcome many limitations inherent in self-reported data. 
Behaviors are measured continuously over the full spectrum of environmental exposures in 
daily life, instead of focusing exclusively on the home neighborhood. Second, data streams 
are integrated using common timestamps into a single data structure, the "LifeLog." A 
graphic interface tool, "LifeLog View," enables simultaneous visualization of all LifeLog 
data streams. Finally, we use geographic information system SmartMap rasters to mea- 
sure spatially continuous environmental variables to capture exposures at the same spatial 
and temporal scale as in the LifeLog. These technologies enable precise measurement of 
behaviors in their spatial and temporal settings but also generate very large datasets; we 
discuss current limitations and promising methods for processing and analyzing such large 
datasets. Finally, we provide applications of these methods in spatially oriented research, 
including a natural experiment to evaluate the effects of new transportation infrastructure 
on activity levels, and a study of neighborhood environmental effects on activity using 
twins as quasi-causal controls to overcome self-selection and reverse causation problems. 
In summary, the integrative characteristics of large datasets contained in LifeLogs and 
SmartMaps hold great promise for advancing spatial epidemiologic research to promote 
healthy behaviors. 

Keywords: accelerometry, behavior, environment, geographic information systems, global positioning systems, 
physical activity 



INTRODUCTION 

The health benefits of regular physical activity are well established, 
including weight control, improved cardiorespiratory fitness, and 
reduced risk of developing chronic diseases such as type 2 dia- 
betes, cardiovascular disease, and some forms of cancer (1-5). 
Despite these recognized benefits, most people in the U.S. do 
not engage in physical activity at levels consistent with recom- 
mendations for health benefits (6, 7). Precise measurement of 
physical activity behaviors, including type, amount, context, and 
place, is essential for increasing physical activity at the popula- 
tion level because it enables a better understanding of where, 
when, and how much activity is or is not occurring. Emerging 
technologies are increasingly being used to improve the preci- 
sion and accuracy of objective physical activity measurement 
and to enable detailed examinations of where and when phys- 
ical activity behaviors actually occur. While these technologies 
greatly advance the field of physical activity research, they also 



present entirely new methodological challenges. For example, the 
large amount of data produced when multiple participants wear 
accelerometers and global positioning system (GPS) devices over 
the course of several days generates new requirements for data 
structure and processing. A typical 7-day period of monitoring 
using a 1-min collection window yields over 1000 observations 
per person; in one recent study, -88% of these data points were 
dropped because of computational incapability with such a large 
data size (8). Because many studies collect refined data at high 
temporal resolution, new tools that can deal with such large data 
sets are necessary. In addition, the multiple activity, location, and 
environment data streams need to be integrated into a compre- 
hensive structure that permits combined analyses of behaviors 
in time and space. Although increasing numbers of studies are 
using these integrated technologies, there is little technical guid- 
ance for researchers who want to use these methods in their 
studies. 
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This paper introduces a novel suite of data collection instru- 
ments, data management tools, and analytic methods to measure 
and analyze activity behaviors that have broad applications in spa- 
tial epidemiology We focus on individual-level, high- resolution, 
objective data on activity, location, and environment. First, we 
describe and assess a range of instruments used to capture phys- 
ical activity and its location over the course of daily life. These 
instruments include accelerometers, GPS data loggers, and travel 
diaries. Second, we present a set of tools, which were created to 
manage the large data sets generated by accelerometry and GPS. 
The first data management tool is the "LifeLog," which combines 
accelerometry- and diary-based activity and GPS-derived loca- 
tion data streams into a single temporal data structure using a 
common timestamp for data linkage. The LifeLog is in turn com- 
plemented by the "LifeLog View," a graphic display interface tool 
that enables simultaneous visualization of activity and location 
data streams. These tools yield a common spatial-temporal data 
structure for activity and location that is also necessary to investi- 
gate high- resolution built and social-environmental influences on 
physical activity behaviors. Third, we seek to bypass the limitations 
of past research, which only considered the influence of the home 
environment or "neighborhood" on behavior; instead, we measure 
physical activity across the full spectrum of exposures encountered 
in daily life. To do so, we have developed a new approach to cap- 
ture the attributes of the built and social environments at the 
many locations generated by GPS data. We introduce SmartMaps, 
a tool for environmental data management. SmartMaps are ras- 
terized or grid-based surfaces, which provide spatially continuous 
values of environmental attributes. The fine-grained grid-based 
measures of environment calculated by the SmartMaps serve to 
capture exposures with the same spatial and temporal resolution 
as that obtained by accelerometry and GPS. 

The emerging technologies embodied in the set of instru- 
ments and management tools presented here promise to pre- 
cisely measure and analyze physical activity behaviors in various 
settings over the full spatial-temporal continuum. They have 
been used in a few studies to date, two of which are described 
in this article, including a natural experiment to evaluate the 
effects of new transportation infrastructure on physical activ- 
ity levels and a neighborhood- effects study that features twins 
as quasi- causal controls to overcome self- selection and reverse 
causation problems. Finally, we discuss both the great potential 
and limitations of the tools and methods presented and suggest 
future studies that would further advance spatial epidemiologic 
research. 

MATERIALS AND METHODS 

ACTIVITY BEHAVIOR AND LOCATION DATA COLLECTION INSTRUMENTS 

Our instruments include accelerometers, GPS devices, smart- 
phones, as well as both paper and digital travel diaries. 

Accelerometer 

We use accelerometers to assess physical activity patterns over time, 
configuring them for various purposes. In one study, accelerom- 
eters were configured to record at minimum acceleration in one- 
axis (orthogonal to earth surface) while in other studies, the con- 
figuration included three-axis accelerometry, steps, incline, and 



ambient light levels. Measurement epochs for the accelerometer 
were set to match GPS recording intervals, ranging from 15 s to 
1 min. 

We use standard off-the-shelf accelerometers such as the Acti- 
Graph GT1M and GT3X models for the objective measurement of 
physical activity. As one explicit example, accelerometry data were 
downloaded using ActiLife software (v3.4.0, ActiGraph LLC., Pen- 
sacola, FL, USA) and exported as comma-separated value (CSV) 
text files containing fields for timestamps and the various sensor 
data streams (i.e., axis counts), text files containing an infor- 
mational header, including starting timestamp, epoch duration, 
and epoch accumulated values ("DAT" format), or native struc- 
tured query language (SQL) format ("AGD" files). Text files were 
imported into a PostgreSQL (9) database, either directly from CSV 
files or using scripts within the statistical program R (10) that pre- 
processed the DAT or AGD data into tables containing one record 
per epoch. Accessing data using R to connect to the SQLite AGD 
files allowed an automated approach for processing multiple files, 
rather than requiring a technician to export individual CSV or 
DAT files using ActiLife software on a per-subject basis. 

Our group currently uses the GT3X+ model and the latest 
version of ActiLife software (v6.8.1). One major innovation is 
that the latest model now collects and stores raw accelerations, 
so that epoch duration can be chosen after data collection at 
the time of data export. This allows accelerometry data to be 
matched to the data collection interval of any other recording 
device. We use accelerometry data as the base table to enforce the 
temporal sequence of the merged dataset containing input from 
multiple instruments. We adopt this approach because, once the 
accelerometer starts collecting data, it continues to record regu- 
larly until the unit runs out of power, reaches the configured "stop 
recording" date/time, or malfunctions, whereas other data collec- 
tion devices may not record regularly or continuously. The unit 
does not permit any participant input (e.g., it has no on/off switch 
or other end-user configuration options) or rely on any other input 
after starting, which reduces participant burden and avoids poten- 
tial user error. Accelerometry activity count data were processed 
to yield time-stamped intensity levels for physical activity using 
commonly accepted thresholds for differentiating activity levels 
(11) and to examine records for the number of complete wearing 
days (7). 

Using accelerometer count thresholds for estimating physical 
activity intensity is problematic because these a priori defined 
thresholds do not necessarily take into account individual-level 
biometric differences, such as variation in body size or aerobic 
fitness level, and they do not allow for the estimation of physical 
activity type or context. Promising work is being conducted using a 
variety of novel methods, including quadratic discriminant analy- 
sis and hidden Markov models (HMM) to recognize common 
physical activities (12), as well as machine-learning algorithms 
that exploit artificial neural networks (13, 14). Our own work with 
these novel methods is described briefly in Section "Multi-Sensor 
Board" below. Indeed, the measures proposed herein may be used 
as validation strategies for such algorithms. The "packaging" of 
these algorithms within easily used software will help researchers 
who are measuring activity levels with accelerometry but who have 
little experience in software development. 
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Global positioning systems 

Our ongoing studies use GPS data loggers to record geospatial 
locations so that we can assess the spatial and temporal character- 
istics of travel and "dwell" patterns (e.g., sojourn at a home or work 
location), including characteristics of specific travel modes. We 
explain how we conflate the GPS and accelerometry data below in 
Section "Data Integration, Management, and Visualization Tools." 

We currently use off-the-shelf models such as the GlobalSat 
(New Taipei City, Taiwan) DG-100 that is equipped with the SiRF 
Star III/LP 20-channel chipset, and the Qstarz (Taipei, Taiwan) BT- 
1000XT that contains the MTK 51 -channel chipset. Both models 
feature solid-state memory and rechargeable batteries that allow 
at least one full day of measurement per charge and up to several 
weeks of data storage, depending on recording interval and data 
type. 

The DG-100 manual states its accuracy as 10 m, whereas the 
stated accuracy of the BG-1000XT is 3 m. The DG-100 can record 
a maximum of only 5 values per record, including position, time- 
stamp, speed, and altitude, whereas the BG-1000XT can record up 
to 19 values, including the previous 4, as well as data quality vari- 
ables such as dilution of precision, number of satellites used in the 
fix, satellite position, and signal-to-noise ratio. 

We collect data in binary format and export them as CSV files, 
with one record per logging interval during which a fix was deter- 
mined (at least four satellites in view and a horizontal dilution 
of precision less than eight). Consumer-level GPS units such as 
the DG-100 and BG-1000XT can be configured to log at regular 
intervals, such as 15 s, but they begin recording as soon as a fix is 
obtained (rather than at a time evenly divisible by 15 s) and store 
the next record after the configured interval has elapsed. 

The GPS data are processed and stored in a PostgreSQL data- 
base enabled with PostGIS, the spatial data storage and analysis 
extension (15). Longitude and latitude coordinates are used to 
generate spatial point features for mapping and spatial analysis. 
Unlike the data structure obtained from accelerometry, GPS data 
frequently contain large intervals without data, caused by signal 
reception failure due to such factors as obstruction of line-of- 
sight with GPS satellites, powering down during recharging, or 
cold starts (delays between starting up and acquiring a satellite 
signal). 

Multi-sensor board 

Our team also uses a multi-modal sensor known as the multi- 
sensor board (MSB), which was developed by researchers at the 
University of Washington in collaboration with Seattle Intel Labs. 
This is a pager-sized device worn clipped to a belt (16). It offers a 
suite of features, including multiple sensing (three-axis accelerom- 
etry, barometric pressure, humidity, temperature, light, audio, 
and GPS), data storage, communication, and local computation 
capabilities. Rather than outfitting study participants with several 
different (separate) devices, the MSB records multiple sensor data 
streams simultaneously. Its functionality yields notable benefits; 
participants need to wear and recharge only one device, and each 
variable is recorded in a single binary file, rather than in several 
files that need management and conflation after download. 

As an experimental device, the MSB has various drawbacks, 
such as limited data storage, limited battery life, and the need 



for expert staff to configure the devices and to download and 
transform the multiple data streams. Despite these limitations, 
it enabled us to develop sophisticated machine-learning algo- 
rithms to quantify physical activity types and estimate corre- 
sponding energy expenditures that were subsequently validated 
in laboratory and field experiments (17, 18). 

The advantages in using single devices that have multiple sen- 
sors and capabilities - such as the MSB and mobile phones - 
make them an important area for further development and even- 
tual deployment. Although we are currently using stand-alone 
accelerometers, GPS devices, and mobile phones in many of our 
research projects, we are benefiting from our previous validation 
work and using our machine-learning algorithms to obtain richer 
data on activity amount (i.e., specific activity types and associated 
energy expenditures) than can be provided by accelerometry and 
GPS alone. 

Travel diary instruments 

Our research agenda is driven by objective data sources. However, 
we have found that an important set of behavioral data is not yet 
available solely through objective measurement. Data for behav- 
ioral variables or characteristics such as activity purpose, visited 
place names and addresses, and certain modes of travel between 
places cannot, in general, be collected without some user input. 
Other activities that are difficult to determine, such as walking or 
jogging on a treadmill or using a stationary bicycle or elliptical 
machine, would likely require substantial work to be identified 
solely from objective data. 

Other behavioral variables are impractical or impossible to 
measure with existing instruments. For example, although some 
devices, such as the ActiGraph GT3X+, are water resistant, most 
current electronic devices, including GPS units, must be removed 
during bathing or swimming, preventing the recording of such 
activities. In addition, objectively sensing behaviors such as eat- 
ing and food shopping would require the development of new 
instruments and data processing methods. Given the lack of such 
instruments, but also the need for obesity- related research to esti- 
mate where and when all exercise, travel, and food- related behav- 
iors occur, we created several travel and food diary instruments. 
For each visited place, key variables include place name, address, 
arrival and departure time, arriving travel mode, and activity or 
purpose. 

Paper version of travel diary. We originally created paper book- 
lets with enough blank pages to account for 14 places per day, with 
extra pages for additional places and days. Participants logged 
place names, addresses, times of arrival and departure, activities at 
each place, and mode of travel from place to place. An example of 
a paper travel diary that we have used in our research is shown in 
Figure 1. 

We wrote a custom Microsoft access database (MDB) applica- 
tion to facilitate transcription from the paper diary to a digital 
format. This database automatically links participants, recording 
days, and place records (see Figure 2). Each participant's data are 
stored in a hierarchy identified and linked by participant ID, day 
number, and record number. The application uses two separate 
MDB files, one containing the data and the other with the forms 
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For this diary, each day begins at 3 AM. Most people are home asleep at 3 AM. If you were at 
home asleep at 3 AM, then check "My Home" below, write all the activities you did before leaving, 
and record the exact time you left for the first time. 



Today's Date: _ 



_/ /_ 



Sun Mon Tue Wed Thu Fri Sat 

(Circle one) 



Q What is PLACE 1? 
□ My Home 



□ Other place name _ 



□ My Primary Workplace □ My Secondary Workplace 

□ My School □ Another PLACE 
If this is another place, provide 



_ (from page 2) 



as much of the address as possible. 



Place name: 
Address: 
City: 



Please provide cross streets: 



0 What ACTIVITIES did you do at PLACE 1? (Write code from ACTIVITY LIST on page 46) 



Main activity: 

(One response only) 



Other 
activities: 

(Record all that 
apply) 



Q Was this your ONLY place for the day? 

□ YES: Done for today 

□ NO: Continue below j 

Q What TIME did you LEAVE PLACE 1? (Please be as exact as possible) 

: AM / PM (circle one) 



Continue to PLACE 2 — >• 



Q What is PLACE 2? 

□ My Home 

□ My Primary Workplace 

□ My School 



Place name: 
Address: 
City: 



□ Other place name 

□ My Secondary Workplace 

□ Another PLACE 

If this is another place, provide 
as much of the address as possible. 



_ (from page 2) 



Please provide cross streets: 



0 What TIME did you ARRIV E at PLACE 2? (Please be as exact as possible) 

| : AM/PM 

Q How did you get to PLACE 2? 

□ Auto/truck/van/carpool/motorcycle □ Walk 

□ Transit (bus, light rail, ferry) □ Bike 

□ Other (taxi, airplane, etc.) 

Q What ACTIVITIES did you do at PLACE 2? (Write code from ACTIVITY LIST on page 46) 

□ 



Main activity: 

(One response only) 



Other 
activities: 

(Record all that 
apply) 



I Was this your LAST place for the day? 

□ YES: Done for today 

□ NO: Continue below j 



I What TIME did you LEAVE PLACE 2? (Please be a 

: AM / PM (circle one) 



Continue to PLACE 3 



FIGURE 1 | Paper travel diary for two places in a single travel day. 



and visual basic for applications ( VBA) code. The "code" database 
uses the Linked Data Manager in Access to display the data tables, 
which are actually stored in the separate "data" MDB file. This 
structure permits updates to the code database without the need 
for copying data tables. 

The code database contains forms that allow easy naviga- 
tion among records for participants, participant days, and places 
as illustrated in Figure 3. The data entry form for place data 



contains VBA code for simple error checking of intra-place 
records. For example, if a place record has the "time arrived" 
later than the "time left," a warning is generated similar to that 
shown in Figure 4. The code database also contains queries 
that display inter-place error checks (e.g., if the "time left" for 
place 1 is later than the "time arrived" at place 2), allowing 
data entry staff to review and correct inter-place sequencing 
errors. 

Digital versions of travel diary. Although our paper diary 
was easy to create, edit, and administer, the tedious transcrip- 
tion process, which used the Access database, was vulnerable to 
errors. Quality control can expose errors (e.g., a.m. and p.m. 
substitutions, transposed numerals, missed records), but each 
potential error required manual review to determine whether it 
originated in the participant's initial recording or in the transcrip- 
tion process. Furthermore, because data entry and data processing 
were performed by different study staff, interpreting errors often 
required communication between research staff (at the same or 
across sites) and retrieving paper documents from archives. 

To avoid the logistical problems associated with paper diaries, 
we wrote two separate travel diary applications for the Android 
smartphone platform. Data collected by smartphone do not 
require transcription, and error-checking can be built into the 
application, with immediate feedback asking the user to correct 
impossible entries (e.g., leaving a place before arriving at that 
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meter_serial_num 
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valid_days 




date_entered 
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places 
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calendar_date 




update_cal_date 




$ daynum 




placenum 
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time_offl 



FIGURE 2 | Travel log database schema for data entry in Microsoft 
Access. 
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manage subjects days 

Subject info | Transit ] Other places) Com ment j Days 



10100273 



Subject ID: 



Planned stand 
Planned end d; 



Workl 

Workl stree 



|10100273 



[2010-07-20 
[2010-07-26 



[Seattle - 



Work 2 

Work2 streei 
Work2 city 
Work2 zip 



"^§1 manage places 
Subjects 

id 10100273 
Days 

Id 10100273 

Places 

10100273 



day number 
1 



1 



place name 
Client's House 
address 

5400 Beach DrSW 

city 
Seattle 

time arrived time left 

12:27 PM 12:30 PM 



zip 

98116 



activity travel mode PiV 
11 J Ij 1 



FIGURE 3 | Travel log database entry forms. Upper panel: common places; lower panel: a single place record. 



Microsoft Access \~ 



Error, time arrived (12:27:00) 
is later than time left (12:20:00). 




FIGURE 4 | System generated warning for illogical timestamps. 



place). This approach minimizes or obviates the need for temporal 
error checking in post-processing. 

My footprints. We initially used the Footprints application for 
HTC (New Taipei City, Taiwan) Android phones to record the 
time and location of specific activities (exercise, eating, and food 
shopping), to encourage participants to create diary records at 
the time specific activities occurred. This application works by 
enabling the user to take a digital photo that is automatically tagged 
with timestamp and location by the phone's locational sensor and 
then manually tagged with other user- entered variables. However, 
Footprints offered few options for configuration. For example, 
the values for the "activity" variable were pre-populated and not 
editable, so that it was impossible to record various activities of 
interest (e.g., food shopping) without resorting to the open-ended 
"comment" variable. 

Instead, we wrote a separate application named My Footprints 
to be more directly useful in our research. A record in My Footprints 
is illustrated in Figure 5, which includes the digital photo filename, 
an automatically generated timestamp, spatial coordinates for the 
location where the photo was taken (although not shown in this 
image capture), and one of four different activities. Data collected 
with My Footprints can be directly transferred from a smartphone 
to the PostgreSQL database. 



Smartphone-based travel diary We also pilot-tested the Memento 
database application for Android phones as a place-based travel 
diary. This highly flexible and configurable app was able to store 
all our required fields. However, when data were exported, place 
records appeared in a seemingly random order, rather than in the 
order in which they were visited. This is problematic because place 
sequencing is a basic functional requirement for our research ques- 
tions. Proper sequencing of places is not an issue for paper travel 
logs. Rows in the log are numbered sequentially, so we can assume 
that participants record places in the correct order, and that place 
numbers are transcribed accordingly. 

Although we were unable to find an effective way to correct 
the sequencing problem in Memento, we created a second Android 
app simply called Travel Diary. This application allowed recording 
and reordering of days and places (shown in Figures 6A,B)> place 
name, address, time arrived and left (Figures 6C-E), and travel 
mode and activity (Figures 6F,G). 

Travel diary processing. Whether paper or digital, the travel diary 
uses place as the unit of measure. Instead of being stored as data, 
trips are created as the temporal interstices between places, and 
generated for each successive pair of place records. A set of R scripts 
converts the travel diary data into format-standardized CSV files. 
These files are uploaded to the PostgreSQL database for integra- 
tion with the GPS and accelerometry data as described in Section 
"Data Integration, Management, and Visualization Tools" below. 

BUILT ENVIRONMENT MEASURES USING SMARTMAPS 

Using GPS to capture location information generates very large 
amounts of data. We needed a novel approach to effectively mea- 
sure built environment characteristics at any or all GPS -derived 
locations recorded from participants. Previous approaches have 
used spatial buffers around participants' geocoded residential 
addresses to extract and summarize geographic information sys- 
tem (GIS) data within the local neighborhood, storing values as 
individual-level variables (19). However, this point-centric mea- 
surement approach requires a substantial amount of data pro- 
cessing for each location. It is also too computationally intensive 
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FIGURE 5 | Examples of My Footprints screen captures. Far left panel: capturing an image with the mobile phone camera; center-left panel: tagging the 
picture as "eating;" center-right panel: review of image and tag; far right panel: overview of recorded activities. 



to be practical for large GPS datasets collected under participant 
free-roaming conditions. 

To address these issues, rather than performing point-centric 
measures of the built environment at all GPS locations, we cre- 
ated SmartMaps for each built environment attribute of interest. 
SmartMaps are raster layers (20) - that is spatially continuous 
surfaces of grid cells - which enable efficient measurements at 
any number of locations within a study area. The point value at 
each SmartMap cell represents a summary of the local neighbor- 
hood value around that cell. SmartMaps provide the same built 
environment values as those generated by the traditional buffer 
method. However, instead of recording neighborhood summaries 
at specific, predefined point locations, SmartMaps do so for every 
cell, continuously across space, thereby enabling measures at any 
location in the study area. 

SmartMaps are created by focal raster processing. The area of 
interest (in our case, King County, WA, USA) is represented as a 
grid of 30 mx 30 m cells, a resolution that has been shown to rep- 
resent urban and suburban parcels with sufficient spatial fidelity 
(21). Each focal cell in the grid is processed independently by using 
the ArcGIS Spatial Analyst Extension. The software performs pre- 
scribed calculations for the neighborhood around the focal cell, 
places the resulting value on that cell, and then moves on to the 
next cell, repeating the process until values are calculated for all 
cells. In our current studies, we use a radius of 833 m to repre- 
sent the focal "neighborhood," corresponding to the distance that 
can be walked in 10 min. For example, to calculate a SmartMap of 
the count of residential units within 833 m of a specified grid cell, 
parcels are first converted into a raster grid in which cell values 
represent the fraction of residential units within the cell (e.g., a 
9000 m 2 parcel containing 20 residential units yields 10 cells with 
a value of 2 units per cell). The process then sums the values of all 



cells within each focal buffer to represent the number of residen- 
tial units in that focal cell's neighborhood. SmartMap cell values 
can then be extracted for GPS points by using the ArcGIS Surface 
Spot analytical method. 

For our studies, we have generated SmartMaps that charac- 
terize elements of the built environment. These SmartMaps cover 
domains that past research has associated with physical activity and 
obesity. For example, neighborhood composition could be repre- 
sented by counts or densities of employes and residential units (22, 
23). Utilitarian or recreational destinations could be captured as 
counts or densities of supermarkets, fast food outlets, traditional 
restaurants, coffee shops, fitness facilities (24, 25), or by count of 
parks, etc. (8, 26). Transportation infrastructure is measured as 
density of intersections, streets, urban trails, etc. (23, 27, 28). Traf- 
fic conditions are represented by estimated traffic volumes (23) 
and bus ridership as a measure of transportation system load (29). 

Each one of our SmartMaps of the 5975 km 2 area of King 
County contains more than 6.8 million 900m 2 (30mx 30m) 
cells, with each cell providing values for the various built envi- 
ronment variables in the associated neighborhood. A SmartMap 
of the count of residential units within 833 m of each cell is shown 
in Figure 7. 

Using SmartMaps to obtain environmental measures for point 
locations is considerably more efficient than performing a series 
of point-centric buffer analyses. For the 3.8 million GPS loca- 
tions that we collected in one study, less than 1 h per SmartMap 
was required to extract built environment data in the form of 
summaries of each 833 m neighborhood (30). Although creating 
SmartMaps for an area requires substantial effort, the resulting 
rasters can readily be used in any subsequent study to ana- 
lyze point measures of the environment within a specified area. 
SmartMaps are essential for the growing number of studies that 
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FIGURE 6 | Travel Diary screen captures. From left to right, top to bottom: (A) layout of place records; (B) editing mode for places within a day; (C) address 
place data entry; (D) assigning timestamps; (E) selecting activity for a place; (F) selecting predefined location; (G) entering open-ended activity. 



use geolocation technologies to track individual movements. Ide- 
ally, urban areas would develop sets of SmartMaps for use by 
multiple agencies or research entities that examine the effects 
of built environment on behavior. Similar efforts have already 
been made in fields such as meteorology and noise mitigation 
(31). Furthermore, SmartMaps can be archived from data sources 
measured at different points in time for use in longitudinal 
studies. 

DATA INTEGRATION, MANAGEMENT, AND VISUALIZATION TOOLS 

We created tools to manage and integrate the massive data streams 
collected by devices in order to examine relationships between 



exposures and behaviors. These include LifeLogs and rasterized 
SmartMaps. 

LifeLogs 

Common timestamps are the "glue" that enables our three basic 
datasets (accelerometry, GPS, and travel diary) to come together. 
Each record from each data source is stored with an explicit time- 
stamp, and tabular joins are enforced by common timestamps or 
time ranges across tables known as LifeLogs. A graphical work 
flow to create LifeLogs is shown in Figure 8; the basic SQL code 
for creating a LifeLog is provided in Example 1 in Supplementary 
Material. 
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FIGURE 7 | A SmartMap of residential unit density. Values are count of 
residential units per acre within 833 m of each raster cell. 



Although creating LifeLogs from individual constituent tables is 
simple in PostgreSQL, some issues need to be addressed to ensure 
that tabular relationships are sound (e.g., all devices must have 
their "clocks" aligned). 

Time zones. Each moment in time can be represented as a time- 
stamp. Timestamps can be rounded to the nearest second with no 
loss of information to provide the level of precision needed in this 
type of spatial epidemiologic research. A commonly accepted stan- 
dard is the number of seconds elapsed since January 1, 1970, 00:00 
UTC (Coordinated Universal Time, or Greenwich time zone); 
this is often called "Unix time." Several factors can introduce 
errors in timestamps. Although one of the benefits of UTC is that 
each moment can be represented unambiguously, errors result if 
time zones are not explicitly specified and handled. The R script 
shown in Example 2 in Supplementary Material illustrates how 
a timestamp can be handled to account for specific time zones. 
PostgreSQL has similar functionality. 

When datasets containing timestamps are passed from one soft- 
ware package to another, careful attention is required to avoid 
errors resulting from conversions that assume that timestamps are 
stored in local time. 

Daylight saving time. Across the U.S. and in many regions world- 
wide, daylight saving time is used to increase the number of 
daylight hours after the work day in summer. When clocks are 
set to change ("spring ahead" or "fall back"), they either lose or 
gain an hour. Unless completely specified timestamps are used 
with software that properly handles daylight saving time, errors 
are possible in measuring intervals that span the moment when 
daylight saving time begins or ends. The software packages used for 
LifeLog data processing and storage, R and PostgreSQL, correctly 



account for daylight savings time transitions as shown in Example 
3 in Supplementary Material, but other software may not. 

Analytic boundary for days. Midnight marks the transition 
between calendar days, but many people are active past midnight. 
In order to assign periods of activity to a behaviorally based unit, 
we decided to use 03:00 a.m. as the transition between analytic 
days. Any activity occurring between 23:59 and 02:59 was assigned 
to the previous calendar day. The simulated example in Exam- 
ple 4 in Supplementary Material shows the day transition after 
02:50. 

Timestamp rounding. Accelerometry timestamps are typically 
collected at regular intervals, such as 10, 15, 30, or 60 s. The GPS 
units are also configured to record at set intervals, but the actual 
time of acquisition is often more sporadic, depending on when the 
GPS unit can obtain a satellite fix. Therefore, to relate accelerom- 
etry records with GPS records, the records that are the closest in 
time in each dataset should be matched. One approach to match- 
ing is to loop through the accelerometry records and find the GPS 
record with the closest timestamp; however, this method is inef- 
ficient from a processing perspective. A better approach is first to 
determine the interval of the accelerometry recording and then 
to round the GPS timestamps to the same interval. Because sev- 
eral different GPS timestamps might round to the same value 
(e.g., 00:01 and 00:02 both round to 00:00), the GPS table is 
truncated to include records with unique rounded timestamps. 
Truncation should give precedence to the GPS timestamp clos- 
est to the rounded timestamp and delete other candidate matches 
(e.g., matching 00:00 with candidates 00:01 and 00:02 would retain 
00:01 and delete 00:02). For ties (e.g., 00:01 and 00:59), Post- 
greSQL will select the first matching record in internal tabular 
order. 

The SQL code for generating rounded timestamps shown in 
Example 5 in Supplementary Material allows the use of any inter- 
val. An example of the rounding function is also shown in Example 
6 in Supplementary Material, which is based on a single partici- 
pant's data with a subset of results shown in Table 1 . The raw GPS 
dataset for this study participant consisted of 29,382 records, but 
after rounding and selecting unique timestamp-rounded records, 
the resulting table contained 17,073 records. For a table of this 
size, the processing time was < 1 s on a RedHat Linux machine 
with a 64-bit Intel Xeon E31270 3.40 GHz processor and 16 GB of 
RAM. 

LifeLog View 

LifeLog Views provide multiple illustrations of complex data 
derived from the LifeLog. Data for a walking bout are illustrated 
in Figure 9. The left panel shows accelerometry, GPS, and place 
and trip information within the same temporal X-axis graph, 
created using R. The map portion of the LifeLog View (right 
panel) shows all GPS locations for a given study participant, with 
bout- specific GPS locations in red, and was created using Uni- 
versity of Minnesota MapServer software 1 . A green line identifies 



1 http://mapserver.org/ 
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FIGURE 8 | Illustration of LifeLog work flow. 



the minimum bounding circle drawn around 95% of the most 
tightly clustered points in the bout. Participant ID, sequential 
bout number, and activity type are printed as the main title of 
the image (top left). Each component image (graph and map) 
was created using automated scripts, and the images were auto- 
matically mosaicked using Imagemagick 2 . Each data element in 
LifeLog View is useful for developing empirically based toler- 
ances for activity classification. A second LifeLog View shows one 
combined accelerometry/GPS/diary graph per day (right panel) 
and GPS locations (left panel) collected over 1 week (Figure 10). 
LifeLog Views were instrumental in developing and validating the 
algorithms used to classify bouts of walking (32). 

Analytical tools 

Data compiled as LifeLogs can be used for many purposes. Because 
LifeLogs contain original data from all sources (accelerometry, 
GPS, and travel diary), they can be used for analyzing, graphing, 
and mapping of activities and locations, either as separate or com- 
bined components, as a spatially and temporally explicit database. 
Possible analyses are briefly discussed using the identification of 
physical activity and walking bouts as examples. Also presented is 
a new tool to graph and map all data in the LifeLog. 



http ://www. imagemagick. org 



Physical activity bouts from accelerometry. Accelerometry data 
can be processed by using established methods to stratify records 
by levels of physical activity. We have considered periods of at 
least 20 min of zero accelerometry counts as non-wearing, while 
days with at least 8 h of wearing time are considered valid (33, 
34). Within valid days, wearing and non-wearing intervals are dif- 
ferentiated following the approach described by Matthews and 
colleagues (35); intervals of at least 60 min of zero counts, with 
no more than two consecutive minutes of 1-50 counts per epoch 
(using a 30-s epoch), are coded as non-wearing. Sustained bouts of 
physical activity are defined as having accelerometry epochs above 
a threshold of 1000 counts per minute for at least 5 min, with 
allowance for 2 min of interstitial epochs below the threshold. A 
threshold of 1000 counts per minute is lower than the thresh- 
olds commonly used to represent moderate-to-vigorous physical 
activity ( 1 1 ) to identify walking bouts. 

Classification of walking type from physical activity bouts. Pro- 
cessing accelerometry data alone allows us to identify bouts of 
physical activity and their relative intensity, but it provides no 
additional information on bout characteristics. Integrating GPS 
and travel diary data adds substantial power to contextualize 
physical activity bouts. GPS can characterize both the instanta- 
neous speed and spatial clustering of individual locations within 
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Table 1 | Illustration of original and rounded timestamps from one 
dataset. 



Rec. 


Time. 


_gps_utc 


Tim e_g ps_utc_std 


Diff. time 


1 


2009-01- 


-16 02:22:52 


2009-01-16 02:23:00 


-8 


2 


2009-01- 


-16 02:23:22 


2009-01-16 02:23:30 


-8 


3 


2009-01- 


-16 02:23:52 


2009-01-16 02:24:00 


-8 


4 


2009-01- 


-16 02:24:22 


2009-01-16 02:24:30 


-8 


5 


2009-01- 


-16 02:24:52 


2009-01-16 02:25:00 


-8 


6 


2009-01- 


-16 02:25:22 


2009-01-16 02:25:30 


-8 


7 


2009-01- 


-16 02:25:52 


2009-01-16 02:26:00 


-8 


8 


2009-01- 


-16 02:26:22 


2009-01-16 02:26:30 


-8 


9 


2009-01- 


-16 02:26:52 


2009-01-16 02:27:00 


-8 


10 


2009-01- 


-16 02:27:22 


2009-01-16 02:27:30 


-8 


11 


2009-01- 


-16 02:27:52 


2009-01-16 02:28:00 


-8 


12 


2009-01- 


-16 02:28:35 


2009-01-16 02:28:30 


5 


13 


2009-01- 


-16 08:44:33 


2009-01-16 08:44:30 


3 


14 


2009-01- 


-16 08:45:06 


2009-01-16 08:45:00 


6 


15 


2009-01- 


-16 08:45:39 


2009-01-16 08:45:30 


9 


16 


2009-01- 


-16 08:46:12 


2009-01-16 08:46:00 


12 


17 


2009-01- 


-16 08:46:45 


2009-01-16 08:47:00 


-15 


18 


2009-01- 


-16 08:47:18 


2009-01-16 08:47:30 


-12 


19 


2009-01- 


-16 08:47:51 


2009-01-16 08:48:00 


-9 


20 


2009-01- 


-16 08:48:24 


2009-01-16 08:48:30 


-6 



Columns represent sequential record number, global positioning systems (GPS) 
measurement time in UTC (Coordinated Universal Time), GPS measurement 
time rounded to a standard 30 s interval, and the difference between raw and 
standardized timestamps. 



bouts. For example, place names, activity types, and transportation 
modes recorded in the travel diary can be used in conjunction 
with accelerometry and GPS data for fine-grained classification of 
walking types (32). 

APPLICATIONS TO RESEARCH 

In this section, we describe two ongoing studies in which we apply 
our suite of tools to spatially oriented research questions. One 
study involves a natural experiment to evaluate the effects of new 
transportation system on physical activity levels. The other study 
evaluates neighborhood effects on physical activity, using identi- 
cal twins as quasi-causal controls to overcome the self-selection 
and reverse causation problems that plague the literature on this 
topic. 

TRAVEL ASSESSMENT AND COMMUNITY 

The travel assessment and community (TRAC) study focuses on 
public transit use. Public transit users tend to engage in higher 
levels of physical activity than non-users. However, we want to 
know if users' physical activity is directly attributable to transit 
use and/or changes in transit access. To address these questions, 
we need data that can tell us when study participants use transit 
and what kind of behavior they exhibit before and after transit 
trips. We hypothesize that they will walk to and from the points 
where they access public transit. Therefore, we need to determine 
whether physical activity that happens in the temporal vicinity of 



transit trips is consistent with walking or with some other form 
of activity, such as working out at a gym. Based on the methods 
described here, and reported by us recently (32), we have suc- 
cessfully used the LifeLog to identify the time, place, and type of 
physical activity performed for participants in the TRAC study, 
and to make estimates of the amount of physical activity directly 
attributable to transit use as described by us in a paper currently 
in press (36). 

In the first longitudinal measurement phase in the TRAC study, 
we recruited 748 participants who had recorded data for any of the 
three instruments. Compliance with measures completion was rel- 
atively high; 715 participants had at least some data from each of 
the three instruments. Of the 701 participants with accelerometer 
and GPS data on valid days, there was a mean of 12.3 accelerom- 
eter wearing hours per day (SD, 1.6 h) and 11.3 GPS hours (SD, 
7.3 h). The average accelerometer wear hours was slightly lower 
than reported in several other studies (between 12.5 and 14.2 h 
per day) (7, 8, 37); however, GPS wear times were not usually 
reported. Some participants who did not satisfactorily complete 
data collection were asked to re-wear the devices and fill out travel 
diaries for additional days; accelerometer data were collected from 
730 participants, with 49 participants (6.7%) providing re-wear 
accelerometry data. 

TWIN STUDY OF ENVIRONMENT, LIFESTYLE BEHAVIORS, AND HEALTH 

Since 2008, all residential addresses for adult twins who are mem- 
bers of the University of Washington Twin Registry (UWTR) 
have been stored in a central database to enable temporal and 
spatial matching with survey data. The Registry is now poised 
to take advantage of the array of data assembled over the 
past several years in analyses of associations among genetic, 
environmental, behavioral, and health variables. Such analyses 
depend on linking all our available data types (survey, bio- 
logical, and environmental). Because twin participants in the 
Registry are surveyed every 2 years, we are also able to follow 
them longitudinally to investigate temporal associations between 
changes in built and social environments and changes in activity 
behaviors. 

Each individual twin's home address is geocoded in ArcGIS 
by using ESRI (Redlands, CA, USA) StreetMap Premium with a 
minimum match score of 100%. Addresses that fail the automatic 
geocoding process (-40%) are matched manually. The follow- 
ing are examples of the environmental exposures we use in our 
research: neighborhood walkability (22, 27, 38-44), level of urban 
sprawl (45), amount of vegetation or "green space" (46, 47), mate- 
rial and social deprivation (48), residential property values (49, 
50), and crime rates (51). These indices rely on multiple data 
sources, including the U.S. Census, parcel-level and tax-lot level 
data, county-level assessor data, and InfoUSA, a commercially 
available resource that provides information on food sources 
as well as fitness, service, and retail facilities. Point- in -polygon 
analysis attaches values from our environmental indices to each 
twin by using the twins geocoded residential address. Although 
much of our environmentally based work focused on the resi- 
dential neighborhood, newer studies such as the one described in 
the paragraph below also include data on the work and school 
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FIGURE 9 | An illustration of Life Log View for a single bout of activity for 
one subject. Left panel: time-indexed combined accelerometry counts (black 
lines), global positioning system (GPS) speed (cyan lines), and place and trip 
data from a travel diary (gray boxes with text labels). A physical activity bout is 
centered in the graph within a white background. The dashed horizontal line 



indicates the accelerometry threshold used to define a bout of walking. The 
dashed vertical lines are at 10 min on either side of the bout. Right panel: map 
of bout location. Larger red dots are individual GPS locations for the bout, 
with all GPS locations shown as small blue dots. The green circle indicates the 
boundary for the 95% most tightly clustered points in the bout. 



environment, as well as "distal" environments that participants 
might frequent on a regular basis (e.g., a favorite coffee shop, 
a gym, etc.). Thanks to the novel applications on which we 
focus in this article, we can now exploit the full activity space 
over time. 

In this research, we will investigate the effects of the built 
environment on lifestyle behaviors and health in a community- 
based sample of 200 adult monozygotic twin pairs (400 individ- 
uals) from the UWTR who were reared together but now live 
apart. This unique sample will permit us to examine environ- 
mental influences on lifestyle behaviors and health, free of the 
genetic and shared environmental (familial) effects that might 
otherwise introduce selection biases into the choice of living envi- 
ronments. We describe each twins residential environment in 
terms of the indices previously noted. Participants are outfitted 
with an accelerometer, a GPS data logger, and an Android smart- 
phone for continuous tracking in time and space over 2 weeks. 
The data from these three tools are joined in a LifeLog indexed by 
common timestamps across devices. An example of data collected 
for one twin pair is shown in Figure 11; LifeLog data will assist 
us in investigating multiple issues. For example, we will deter- 
mine the association between the home-built environment and 
levels of both walking and total physical activity in twins who 
live apart. We will also compare location-based physical activity 
and eating episodes in real-time to assess whether proximity to 



features of the home-built environment are associated with use 
by measuring how many physical activity and eating episodes 
occur in the home-built environment versus in-distal built envi- 
ronments, including work, transit, and recreation-related settings. 
Our study design is notable in several ways: it overcomes the 
measurement bias inherent in self- report data, addresses the prob- 
lem of defining "neighborhood," and engages in novel spatial- 
temporal measures of behaviors that correspond to ecological 
exposures. 

To date, we have completed data collection on 70 twin pairs. 
Compliance with wearing the devices has been exceptional; of 
106 individual twins whose data has been processed thus far, 
average wearing days for the accelerometer and GPS is 13.8 (out 
of 14 days of measurement). The average wearing days for the 
mobile phone with entries for MyFootprints and Travel Diary are 
6.6 and 7 days, respectively (out of 7 days of measurement for each 
program). Of course, when we enter the data analysis phase our 
group will need to determine the actual number of valid wear days 
based on the number of valid hours for each day for each device. 
Nonetheless, this preliminary "peak" at the data on wear time is 
promising. 

LIMITATIONS 

Our methods and analyses are based on objective measures of 
location and physical activity; however, the devices and basic data 
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FIGURE 10 | Example LifeLog data collected over 1 week for a single 
subject. Graphs (right) indicate accelerometry counts in cyan, GPS speed in 
magenta, places and trips (gray boxes), and physical activity bouts (black 
rectangles) for each day. The map (left) shows all GPS points in magenta and 



those that occurred within a physical activity bout in cyan, as well as home 
and work locations ("H" and "W" markers). A summary of demographic 
characteristics and overall subject-level activity, as well as a close-up of graph 
data, are shown in the callouts. 



processing methods for these are not perfect. GPS data of sufficient 
duration and quality are challenging given such inherent problems 
as urban canyons, cold starts, and limited battery life. When GPS 
data are not present, it is not possible to determine whether data 
loss was due to power being turned off or loss of signal. Like- 
wise, when a GPS is powered on and recording, but not worn 
(e.g., recording when charging overnight), data will be logged even 
though these will not reflect actual movement patterns. Newer 
generation locational technologies using combined GPS and WiFi 
triangulation combined with other sensors for detecting move- 
ment through space are likely to provide better locational data in 
the near future. 

Capturing behavior through time is problematic; we still rely on 
participants to record their travel and activity behavior. Although 
the use of smartphones as diary recording devices provides bene- 
fits such as obviating the need for data transcription and automatic 
time-stamping of recorded activities, there is still a relatively high 



participant burden to enter travel and activity information, regard- 
less of the instrument used. Several investigators are exploring 
the use of portable cameras to capture periodic images for use 
in activity classification (52-54); however, such methods rely on 
manual annotation of images, which is a tedious and lengthy 
process. At this time, it is unknown when a reliable method for 
automatically classifying behaviorally defined activity types will 
be developed. 

CONCLUSION 

In summary, there is a growing interest in obtaining more preci- 
sion and more information about the amount, type, and context 
of physical activity and other health behaviors. Newer devices 
(e.g., portable GPS) and their combined use offer opportunities to 
improve precision and collect this additional information. How- 
ever, standard methods and procedures are needed to best capture 
and integrate the large volume of data obtained from these devices. 
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FIGURE 11 | One-day LifeLogs collected from a twin pair. The upper 
panel shows self-reported data on place (red) and trip (green) from the 
travel diary along with objective accelerometry counts (magenta) and 



GPS velocity (blue). The lower panel maps travel patterns for the 
same individuals over the same days, with markers shown at hourly 
intervals. 



The integrative characteristics of the large datasets contained in 
LifeLogs and SmartMaps hold great promise for advancing spatial 
epidemiologic research, especially work whose goal is to facilitate 
behaviors that promote health. 
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